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Max-Weight Achieves the Exact [0(1 /V), 0(V) 
Utility-Delay Tradeoff Under Markov Dynamics 
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Abstract — In this paper, we show that the Quadratic Lyapunov 
function based Algorithm (QLA, also known as MaxWeight 
or Backpressure) achieves an exact [0(1/V),0(V)\ utility- 
delay tradeoff in stochastic network optimization problems with 
Markovian network dynamics. Note that though the QLA al- 
gorithm has been extensively studied, most of the performance 
results are obtained under i.i.d. network radnomness, and it 
has not been formally proven that QLA achieves the exact 
[0(1/V), 0(V)] utility-delay tradeoff under Markov dynamics. 
Our analysis uses a combination of duality theory and a variable 
multi-slot Lyapunov drift argument. The variable multi-slot 
Lapunov drift argument here is different from previous multi- 
slot drift analysis, in that the slot number is a random variable 
corresponding to the renewal time of the network randomness. 
This variable multi-slot drift argument not only allows us to 
obtain an exact [0(l/V),0(V)\ tradeoff, but also allows us to 
state the performance of QLA in terms of explicit parameters of 
the network dynamic process. 

Index Terms — Queueing, Dynamic Control, Lyapunov analysis, 
Stochastic Optimization 



I. Introduction 

In this paper, we show that the Quadratic Lyapunov func- 
tion based Algorithm (QLA, also known as the MaxWeight 
algorithm) HI achieves an exact [0(1/V),0(V)] utility- 
delay tradeoff in the following general stochastic network 
optimization problem. We are given a discrete time stochastic 
network. The network state, which describes the network 
randomness, such as the network channel condition or the 
random arrivals, is time varying according to some Markov 
process. A network controller performs some action based 
on the observed network state at every time slot. The chosen 
action incurs a cost, Q but also serves some amount of traffic 
and possibly generates new traffic for the network. This traffic 
causes congestion, and thus leads to backlogs at nodes in the 
network. The goal of the controller is to minimize its time 
average cost subject to the constraint that the time average 
total backlog in the network is finite. 

This is a very general framework and includes a wide 
class of networking problems, ranging from flow utility 
maximization [2], energy minimization 0, network pric- 
ing JU to cognitive radio applications etc. Also, many 
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techniques have also been applied to this problem (see 
for a survey). Among the many techniques that have been 
adopted, the family of Quadratic Lyapunov function based 
Algorithms (QLA) [1] are recently receiving much attention, 
due to their provable performance guarantees, robustness to 
stochastic network conditions, and most importantly, their 
ability to achieve the desired performance without requiring 
any statistical knowledge of the underlying randomness in the 
network. When the network state is i.i.d., it has been proven 
in HI that QLA can achieve a utility that is within 0(l/V) 
of the optimal utility for any V > 1 for general network 
optimization problems, while guaranteeing an 0(V) network 
delay. Two works JS) construct algorithms to achieve an 
[0{l/V),0(\og{V))] utility-delay tradeoff using exponential 
Lyapunov functions. The recent work J9] also develops the 
Fast-QLA (FQLA) algorithm based on quadratic Lyapunov 
functions to achieve an \0(1/V), 0([log(V)] 2 )] tradeoff. 

When the network state is Markovian, it has been shown 
that when the network backlogs are deterministically bounded, 
QLA can also achieve utilities within 0(log(V)/V) to the op- 
timal values J4) 0, while guaranteeing that the average delay 
is 0(V). Without such deterministic queueing bounds, it has 
recently been shown that QLA achieves an [0(e+ -^-), O(V)] 
tradeoff under Markovian network states flTOfl , where e > 
and T c represent the proximity to the optimal value and the 
"convergence time" of the QLA algorithm to that proximity, 
respectively. However, there has not been any proof showing 
that QLA achieves the exact [0(l/V),0(V)] utility-delay 
tradeoff under Markovian network dynamics. 

In this paper, we present the first proof of the exact 
[0(1 /V), 0(V)] tradeoff of the QLA algorithm under Marko- 
vian network dynamics. To establish the result, we use a com- 
bination of duality theory and a variable multi-slot Lyapunov 
drift argument. Different from previous multi-slot drift argue- 
ments, e.g., HI, where the drift is usually computed over a fixed 
number of slots, the slot number here is a random variable 
corresponding to the renewal time of the network dynamic 
process. This [0(l/V),0(V)] tradeoff result contributes to a 
better understanding of the QLA algorithm performance and 
enables more precise resource allocation decisions in network 
optimization problems. The result can also be combined with 
the recent result developed in (9) to show that the FQLA 
algorithm achieves an \0(1/V), 0([log(y)] 2 )] tradeoff for 
stochastic network optimization problems with Markovian net- 
work dynamics, and is thus the first known algorithm that can 
ensure a poly-logarithmic delay performance when pushing 
the utility performance to within 0(1/V) of the optimal in 
this Markovian case. 
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This paper is organized as follows. In Section (TTJ we set up 
our notations. We then present our system model in SectionHIIl 
We review the QLA algorithm in Section [IV] The performance 
results of QLA under the Markovian network dynamics are 
obtained in Section [V] 

II. Notations 

Here we specify our notations. M represents the set of real 
numbers. R + (or R_) represents the set of nonnegative (or 
non-positive) real numbers. R™ (or R") represents the set of 
n dimensional column vectors, with each element being in R 
(or R+). bold symbols a and a T represent column vector and 
its transpose, a > b indicates that vector a is entrywise no 
less than vector b. \ \a—b\\ is the Euclidean distance of a and 
b. is the column vector with all elements being 0. 

III. System Model 

In this section, we specify the general network model we 
use. We consider a network controller that operates a network 
with the goal of minimizing the time average cost, subject 
to the queue stability constraint. The network is assumed to 
operate in slotted time, i.e., t 6 {0, 1, 2, ...}. We assume there 
are r > 1 queues in the network. 

A. Network State 

In every slot t, we use S(t) to denote the current network 
state, which indicates the current network parameters, such as 
a vector of channel conditions for each link, or a collection of 
other relevant information about the current network channels 
and arrivals. We assume that S(t) evolves according to a 
general irreducible and aperiodic Markov chain with countably 
many states and denote its state space by <S = {si, S2, S3, . . .}. 
We assume S(t) has a well defined steady state distribution, 
and let ir Si denote its steady state probability of being in state 
Si. Note that in this case, by Theorem 3 in Chapter 5 of ifTTI . 
the existence of a steady state distribution ir implies that all 
the states are positive recurrent, hence ir Si > for all i. 

B. The Cost, Traffic, and Service 

At each time t, after observing S(t) = Si, the controller 
chooses an action x(t) from a set X^ Si \ i.e., x(t) = x^ Si ' for 
some x^ Si ' € X^ Si \ The set X^^ is called the feasible action 
set for network state s; and is assumed to be time-invariant and 
compact for all Sj 6 S. The cost, traffic, and service generated 
by the chosen action x(t) = x^ Si ' are as follows: 

(a) The chosen action has an associated cost given by the 
cost function f(t) = f{s t ,x ( - s ^) : X^ i-> R + (or 
*(»<) 

1 — y R_ in reward maximization problems); 

(b) The amount of traffic generated by the action to 
queue j is determined by the traffic function Aj(t) = 
Aj(s l ,x ( - s ' S) ) : X^^ i-> R + , in units of packets; 

(c) The amount of service allocated to queue j is given by 
the rate function (ij(t) = fJ,j(si, x^) : X^ i-> R + , in 
units of packets; 

Note that Aj (t) includes both the exogenous arrivals from out- 
side the network to queue j, and the endogenous arrivals from 
other queues, i.e., the transmitted packets from other queues, 



to queue j. We assume the functions /(sj, •), /Uj(sj, •) and 
Aj (si, •) are time-invariant, their magnitudes are uniformly up- 
per bounded by some constant S max <G (0, 00) for all Si, j, and 
they are known to the network operator. We also assume that 
there exists a set of actions {x^^}^!^ '"' r+2 with x k € 
X( Si "> for all Si, and a set of variables {^^I^T-^ 2 '''''' +2 w i tn 
J2 k ^ Sl) = 1 and $ ( k Si) > for all s, and k such that: 

E^{E^ Sl) [^( s -4 Si) )-^(^ ; 4 Si) )]} <-v, (i) 

Si k 

for some rj > for all j. That is, the queue stability constraints 
are feasible with 77-slackness. Thus, there exists a stationary 
randomized policy that stabilizes all queues (where repre- 
sents the probability of choosing action x k 8 '^ when S(t) = s;). 
In the following, we use A(t) = {A\ (t), A 2 (t), A r {t)) T 
and (j,(t) = (/ii (i) , /X2 (t) , ...,^, r {t)) T to denote the arrival and 
service vectors at time t. It is easy to see from above that if 
we define: 

B = VrSmax, (2) 
then \\A(t) - fi(t)\\ < B for all t. 

C. Queueing, Average Cost, and the Stochastic Problem 

Let q{t) = ( qi (t),...,q r (t)) T G R+, t = 0,1,2,... be 
the queue backlog vector process of the network, in units of 
packets. We assume the following queueing dynamics: 

Qj (t + 1) = max [q 3 (i) - m (t), 0] + A, (t) Vj, (3) 

and q(0) = 0. By using (0), we assume that when a queue does 
not have enough packets to send, null packets are transmitted. 
In this paper, we adopt the following notion of queue stability: 

r t — 1 r 

3 =l ' T =0 3 =l 

We also use f^ v to denote the time average cost induced by 
an action-choosing policy II, defined as: 

t-x 

4 lim SU p ^ E {/ n (r)}, (5) 

T = 

where /^(t) is the cost incurred at time r by policy II. We 
call an action-choosing policy feasible if at every time slot t, 
it only chooses actions from the feasible action set X^ s ^\ 
We then call a feasible action-choosing policy under which (01 
holds a stable policy, and use f* v to denote the optimal time 
average cost over all stable policies. In every slot, the network 
controller observes the current network state S(t) and chooses 
a control action, with the goal of minimizing time average cost 
subject to network stability. This goal can be mathematically 
stated as: (PI) min : f£ , s.t. @). In the rest of the paper, 
we will refer to problem (PI) as the stochastic problem. 
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IV. QLA and the Deterministic Problem 

In this section, we first review the quadratic Lyapunov 
function based algorithms (the QLA algorithm) (T) for solving 
the stochastic problem. Then we define the deterministic 
problem and its dual problem. We then also discuss some 
properties of the dual function. The dual problem and the 
properties of the dual function will be used later for analyzing 
the performance of QLA. 

We first recall the QLA algorithm |T) as follows. 

QLA: Initialize the parameter V > 1. At every time slot t, 
observe the current network state S(t) and the backlog q(t). 
If S(t) = Si, choose x^ £ X^ Si ^> that solves the following: 

r 

max : ~Vf(s u x) + ^q j (t)[ij, j (s l ,x) - Aj(si,x)] (6) 
i=i 

s.t. x£X {H) . 

Depending on the problem structure, (|6) can usually be 
decomposed into separate parts that are easier to solve, e.g., 
131 , H. Also, when the network state process S(t) is i.i.d., it 
has been shown in JTJ that, 

fl LA = fL + 0{l/V), q® LA = 0(V), (7) 

where f® v LA and qQ LA are the expected average cost and 
the expected time average network backlog size under QLA, 
respectively. When S(t) is Markovian, it has been shown in, 
e.g., H and that QLA achieves an [0{\og{V)/V),0{V)\ 
utility-delay tradeoff if the queue sizes are deterministically 
upper bounded by 8(V) for all time. Without this deter- 
ministic backlog bound, it has recently been shown that 
QLA achieves an [0(e + y-),0{V)} tradeoff under Markov 
S(t) processes, where e and T c represent the proximity to 
the optimal value and the "convergence time" of the QLA 
algorithm for this proximity 1101 . However, this latter tradeoff 
is less explicit, and it is common that when S(t) is Marko- 
vian, T e = f2(log(i)), in which case we again have an 
PC-2S^1),0(V)] tradeoff when e = l/V. 
We also recall the the deterministic problem defined in J9): 

min: T{x) = V ^sj(si,x^) (8) 



s.t. 



^tt^A^x^) 



c ( Si ) eX ( Si ) vi = l,2,... 

where ir Si corresponds to the steady state probability of S(t) = 

) T . The dual problem of ([H) can 



,..,x 



(s M )\T 



St and x = (x 
be obtained as follows: 

max : 3(7), s.t. 7^0, (9) 

where g(-y) is called the dual function and is defined as: 



(10) 



Here 7 = (7i,...,7 r ) T is the Lagrange multiplier of 
((H). It is well known that (7(7) in (fTOb is concave in the 
vector 7, and hence the problem (|9) can usually be solved 
efficiently, particularly when cost functions and rate functions 
are separable over different network components. It is also 
well known that in many situations, the optimal value of ((9) 
is the same as the optimal value of ((8) and in this case we say 
that there is no duality gap lfl2l . However, despite the fact that 
the problem ([HJ may be non-convex, in which case the duality 
gap is usually nonzero, our first result shows that the dual 
problem ((9) gives the exact value of Vf* v , where f* v is the 
optimal time average cost for the stochastic problem. Below, 
1v = (7yi! 7t/2i ••■) lvr) T denotes an optimal solution of the 
dual problem (|9) with the corresponding V parameter. 

Theorem 1: Let -fy be an optimal solution of the dual 
problem ((9). We have: 



9hv) = Vfav 



(ID 



Proof: See Appendix A. ■ 
The following corollary is immediate and will be useful for 
our following analysis. 

Corollary 1: For any 7 > 0, we have: 



.9(7) < Vf a 



(12) 



In the following, we also define the functions g Si (7) for 
each Si = si, S2, ■■■ as follows: 



9s t (7) = , inf , 



Vf(s h x^) 



(13) 



That is, the g Si (-) function is the dual function of (|H) when the 
network has only one single network state Sj, i.e., the network 
condition is deterministically described by Sj. It is easy to see 
from ([10) and (Qj) that: 



5(7) 



(14) 



Also, the term = {G { * i },G { *$,...,G^)) T with: 

G 7 S ,i =[-H 4 S,) ) + A i ^ 4 S,) )] . ( 15 > 

is called the subgradient of the g Si (-) function at the point 7 
lTl2l . It is known that for any other 7 £ W, we have: 

(7-7) T G^ } >g Si (7)-9 Si h)- (16) 
Using the fact that ||G^ i:) || < B, ([16) also implies: 

S- 4 (7)-fl«(7)<5||7-7||- (17) 

V. Performance of QLA under Markovian 
Dynamics 

In this section, we prove that under the Markovian network 
state dynamics, QLA achieves an exact [0(y),0(V)} utility- 
delay tradeoff for the stochastic problem. This is the first 
formal proof of this result. It generalizes the [0(y),0(V)] 
performance result of QLA in the i.i.d. case in JT|. To prove the 
result, we use a variable multi-slot Lyapunov drift argument. 
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Different from previous multi-slot drift arguments, e.g., |[T3l 
and |fT0| , where the drift is usually computed over a fixed 
number of slots, the slot number here is a random variable 
corresponding to the return time of the network states. As we 
will see, this variable multi-slot drift analysis allows us to 
obtain the exact [O(y), 0(V)] utility-delay tradeoff for QLA. 
Moreover, it also allows us to state QLA's performance in 
terms of explicit parameters of the Markovian 5(i) process. 

In the following, we define Ti(to) to be the first return time 
of S(t) to state Si given that S(to) = Si, i.e., 

T;(t ) = inf{T > 0, s.t. S(t + T) = 8i \ S(t ) = «<}. 

We see that Tj(io) has the same distribution for all to- Thus, 
we will use T, t to denote the expected value of Ti(t) for any 
t s.t. S(t) = Si and use T 2 to denote its second moment. By 
Theorem 3 in Chapter 5 of ifTTI . we have for all states Si that: 



Ti = 



< oo, 



(18) 



i.e., the expected return time of any state s, is finite. In the 
following, we also use Tji(to) to denote the first hitting time 
for S(t) to enter the state Sj given that S(t ) = Sj. It is 
again easy to see that Tji(to) has the same distribution at 
all to- Hence we similarly use Tji and to denote its first 
and second moments. Throughout the paper, we make the 
following assumption: 
Assumption 1: There exists a state s\ such that: 



oo, 



That is, starting from any state Sj (including si), the random 
time needed to get into state si has a finite second moment. 
This condition is not very restrictive and can be satisfied in 
many cases, e.g., when S is finite. 

We now have the following theorem summarizing QLA's 
performance under the Markovian network state dynamics: 

Theorem 2: Suppose (fTJ holds. Then under the Markovian 
network state process S(t), the QLA algorithm achieves the 
following: 



f QLA < f * , CB2 
J av — J av > t /"Ttt - ' 

V ±1 



3=1 



CB 2 + TxV8 n 
V 



DB 2 



(19) 
(20) 



where 77 > is the slack parameter defined in (Q~|i in Section 
IIII-B1 and C, D are defined as: 



C = T 2 



T h D = T 2 -T l7 



(21) 



i.e., C and D are the sum and difference of the first and second 
moments of the return time associated with si. 

Note that r},C,D = 0(1) in (|20]l, i.e., independent of V. 
Hence Theorem [2] shows that QLA indeed achieves an exact 
[0(1/V),0(V)] utility-delay tradeoff for general stochastic 
network optimization problems with Markovian network dy- 
namics. Although our bounds may be loose when the number 
of states is large, we note that Theorem [2] also applies to the 
case when 5(f) evolves according to a Markov modulated 
i.i.d. process, in which case there is a Markov chain of only 



a few states, but in each Markov state, there can be many 
i.i.d. randomness. For example, suppose S(t) is i.i.d. with 10 4 
states. Then we can view S(t) as having one Markov state, but 
within the Markov state, it has 10 4 i.i.d. random choices. In 
this case, Theorem [2] will apply with C = 2 and D = 0. These 
Markov modulated processes can easily be incorporated into 
our analysis by taking expectation over the i.i.d. randomness 
of the current Markov state in Equation (1221 . These Markov 
modulated processes are important in stochastic modeling and 
include the ON/OFF processes for modeling time-correlated 
arrivals processes, e.g., |[T4l . 

Proof: (Theorem |2j To prove the theorem, we first define 
the Lyapunov function L(t) = | JZj=i ( ?|(^)- By using the 
queueing dynamic equation (0, it is easy to obtain that: 

l<lj(t + 1) - l<l](t) < Slav + QM A j(t) ~ 

Summing over all j = l,...,r and adding to both sides the 
term Vf(t), we obtain: 

L(t + 1) - L(t) + Vf(t) (22) 

< b 2 + !vm +x>(*)[^-(*) - ^(*)]|- 

*- J=l > 

We see from © then given the network state S(t), QLA 
chooses an action to minimize the right-hand side (RHS) at 
time t. Now compare the term in {} in the RHS of (l22l with 
(fT3] l, we see that we indeed have: 

L(t + 1) - L(t) + Vf Q (t) < B 2 + g s{ t)(q(t)), (23) 

where we use f®(t) = f(x® LA (t)) to denote the utility 
incurred by QLA's action at time t, and gs(t)(') is me function 
(fL3l with the network state being S(t). 

(Part A: Proof of Utility) We first prove the utility per- 
formance. Consider t = and first assume that 5(0) = si. 
Summing up the inequality d23l from time t = to time 
t = Ti(0) - 1, we have: 

Ti(0)-1 

L(T 1 (0))-L(Q)+ Vf Q (t)<Ti(0)B 2 
t=o 

Ti(0)-1 

+ E 9s(t)(q(t)). 
t=o 

This can be rewritten as: 

Ti(0)-1 

L(T x (0))-L(0)+ J2 v f Q ( t ) <Ti{0)B 2 (24) 
t=o 

Ti(0)-1 Ti(0)-1 

+ E 9s(t)(q(0))+ J2 [9s(t)(q(t)) - 9s(t)(q(Q))]- 

t=0 t=0 

Using (flTt and the fact that \\q(t + r) — q(t)\\ < tB, we see 
that the final term can be bounded by: 

Ti(0)-1 Ti(0)-1 

E [9s( t )(q(t))~9s(t)(q(0))} < E tB " 
t=Q t=a 
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t=0 

Ti(0)-1 



Plugging this into d24j, and letting C = i(Ti(0)) 2 + ^Ti(O), fc* 71 time after time when S(t) = s x , we have: 
we obtain: 

E{L(t k +i) - L(t k ) | Sfa). «(**)} (31) 
L(Tl(0))-L(0)+ £ ^/ Q W ( 25 > +E{ J2 Vf Q (t)\S(t k ),q(t k )} 

t—t k 

<CB 2 +T 1 Vf* avl 
Now taking expectations over q(t k ) on both sides, we have: 

E{L(t k+1 )-L(t k )\S(t k )}+E{ VfQ(t)\S(t k )} 

Here (to) denotes the number of times the network state *~ tfc 4 

Si appears in the period [to, to + Ti(0) - 1]. Now we take - ^ B + Tl ^fav 

expectations over T\ (0) on both sides conditioning on 5(0) = Note that giyen S{Q) = ^ we haye the complete i n f ormation 

si and q(0), we have: of s ^ for aU fc Hence ±& aboye h thg same as; 

E{L(T x (0))-L(0) | 5(0), 9(0)} (26) 

Ti(0)-1 

+E{ Vf®(t)\S(0),q(0)} 



<CB 2 + J2 9S(t)(q(0)) 
t=o 



< 



E{L(t k+1 )-L(t k )\S(0)}+E{ J2 VfQ(t)\S(0)} 

t=tk 

<CB 2 +T\Vf: v . (32) 
CB 2 +E E K l(0, (°) I S(0),q(0)}9sM0))- Summing the above from k = to K - 1, we get: 

— E\L(t K )-L(Q) \S(0) = s 1 \ (33) 
Here C = E{C \ 5(0), 9(0)} = \[T 2 + T{\. The above tK _ x 
equation uses the fact that g Si (q(0)) is a constant given q(0). +E{ V f®(t) | 5(0) = Si} 

Now by Theorem 2 in Page 154 of ifTTI we have that: 



E{n^°)(0)|5(0),q(0)} = ^i. (27) 



t=o 

?2 



< KCB + KT\Vf* v . 



Using the facts that \f(t)\ < S max , \KT{\ < KTt+1, L(0) = 
Plug this into <|26]i, we have: and L(t) > for all t, we have: 



E{X(Tx(0))-L(0) I 5(0), q(0)} (28) 

Ti(0)-1 



E{ £ \//«W|5(0) = Sl } 



{ £ ^(*)|5(0),fl(0)} <W + WC + ^. 

4=0 ' 



(34) 



1 _ , /nw +VS max E{\KT 1 - t K \ I 5(0) = Sl }. 

<CB 2 + — ^7r Si5si (q(0)). _ 

Dividing both sides by V^f-ftTTi], we get: 



Now using dm and (O, i.e., T x = 1/tt s1 and .9(7) = [kW}-1 

E Si ^gM we obtain: t^E{ £ /«(i) | 5(0) = Sl } (35) 

E{L(T 1 (0))-L(0) I 5(0), 9(0)} (29) < J3B^K_ KTiV^ VSmax 

Ti(o)-i - VT-KTi] [i^TYl [IvTY] 

+E{ 2 ^(t)|5(0),g(0)} ,,fi *ir-K^ ,,o^ X KS max 

Since tx = Y^ k =o Ti(t k ) with t = 0, and each Ti(t k ) is 



t=o 

?2 



<CB 2 +T l9 (q(0)). 



By Corollary [B we see that g(q(0)) < Vf* v . Thus we iid distributed with mean T[ and" second moment I? < 00, 
conclude that: we have; 



E{L(T x (0))-L(0) I 5(0), 9(0)} (30) 

Ti(0)-1 



(36) 



-E{ V Vf Q (t) I 5(0), 9(0)} tu-KT\,i T 2 

<E{| tg * ±1 \ 2 I 5(0) = Sl } < -± 



t=o 

?2 



< C*B^ +TiVf* a 



This implies that the term E{| tg ~/ Tl | | 5(0) = s x ) -4-0 
More generally, if t k = t k -i + Ti(t k -i) with to = is the as K — > 00. It is also easy to see that [ifTi] — >• 00 and 
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y^=y — > = as K — > oo. Thus using d36l l and taking a limsup and using the facts that g(~y) = gdl) and < f(t) < S max , 
as K -> oo in ([35]), we have: it can be shown that 51(7) satisfies: 

limsup— =-E{ ]T I 5(0) = Si} < — = + /*„■ 9 {l) < V6 max - T) ]T 7i . (39) 

ff^m I -"--til j_ f-il j = 1 

Now consider the case when the starting state is 7^ si. T t ■ ,u- ■ fToi u 

6 5 / Using this in ( 138b . we have: 

In this case, let Tji(O) be the first time the system enters state 

si. Then we see that the above argument can be repeated for K{L(T (0)) — L(0) | 5(0) q(0)\ (40) 

the system starting at time Tji(O). The only difference is that ' 
now the "initial" backlog in this case is given by q(Tji(0)). , V-^ q , 

Specifically, we have from (EH) that: +E I 1^ V * Wl 5 (°)>9(0)} 

t=o 

E{L(t*) - L(T 3 -i(0)) I ^1(0), 5(0) = a,-} (37) 



t K -i 



< OB 2 + HFtW* - Til? ^ q, (0). 



+E{ J] VfQ(t)\T jl (0),S(0) = s j } 

t=T 3 i(o) More generally, we have: 

< KCB 2 + KT\Vf* v . 

E{L(t k+1 )-L(t k )\S(t k ),q(t k )} (41) 
Here ijf is the A return time of 5(i) to si after time 7}i(0). r 

We thus obtain: < CB 2 + 7\ W mM - TiV E * (** ) ■ 

FE{ £ /«(t)|T jl (0), J S'(0) = fli } 

t=Tji(0) 



Here is the k return time of 5(f) to state si after time 0. 



< KCB 2 + KT lV r av + E{L(T,!(0)) | 7^(0), 5(0) = S ,}. ^ ing ex P ectations on both sides over «(**) and rearranging 
— ijati L\jx\//iji\/7w jj the terms, we get: 

However, since the increment of each queue is no more than 

S max every time slot, we see that L(T_,i(0)) < [T il (0)] 2 B 2 /2. E{L(t k+1 ) - L(t k ) \ S(t k )} (42) 
Also using the fact that |/(i)| < S max for all < t < T n (0), * 

we have: +T1V E E {* (**) I $(**)} < CB 2 + T x V5 max . 

i=i 

tjr-i 

£ /«(*) 1^1(0), 5(0) = Sj ) 

Now using the fact that conditioning on S(t k ) is the same as 
conditioning on 5(0), we have: 



t=o 

< KCB 2 + KT\Vf* v + [T 3l {Q)} 2 B 2 /2 + Tji(0)VSr, 
Now taking expectations over Tji(O) on both sides, and using 



E{L(t k+ i) - L(t k ) I 5(0)} (43) 



a similar argument as (O, we get that for every starting state =- v^ir-r i± \\ ^ , tttttc 

we have . +T lT i 2^E{ qj (t k ) I 5(0)) < CB +T 1 V6 n 

3* ' „• — 1 



[T^+K-Til-l 

limsup _ Ef V f Q (t) I 5(0) = s •) Summing over k = 0, A" - 1, rearranging the terms, and 



x^oo [Tj'i + ATi] 1 ^ J using the facts that L(0) = and L(i) > for alH: 

K-l 



VTi a " ^ T 1?7 2 E{ 9j (t fc ) I 5(0) } < KCB 2 + KTiVS max . (44) 

This proves the utility part ( fT9l . fc=0 J =1 

(Part B: Proof of Backlog) Now we look at the backlog 

performance of QLA. We similarly first assume 5(0) = Dividing both sides by KT lV , we get: 
Recall that equation d29b says: 



K-l r 

E{X(Ti(0))-L(0)|5(0),g(0)} (38) 

Ti(0)-1 

+E{ 2J ^/ Q W I 5(0), q(0)} Now using the fact that \ qj {t + T) - q 3 {t)\ < rS max , we have: 



Jr E E^C*) I ^ Cg + ^ mM - (45) 



t=0 

?2 



<CB 2 +T ig (q(0)). 

Using the definition of ^(7) defined in g9) in Appendix A, E E *( T ) - E *(*fe) 

plugging the set of }i=i$". ' ,r+2 variables and the set ^ J ^ 

of actions {x k s }i=i2' '"' r+2 in the slackness assumption ([T]), +[^ (^i(^)) — (^ fe )]^ 
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Taking expectations on both sides conditioning on 5(0) (which 
is the same as conditioning on S(tk)), we get: 



tfc+i— i 



E { £ £<&wis(o)} 

r=t k j=l 



< EjTi^oE^fe) 1 5(o)} + \\n - \^\B 2 



= 5\E{5>(f fc ) I 5(0)} + [^f- 7^7Y]s 2 . 

3=1 

In the last step, we have used the fact that Ti(tk) is inde- 
pendent of q(tk). Summing the above equation over k = 
0, 1, K — 1, we have: 

e{ £ 5>(*)is(°)} 

t=o j=l 

T 



< E T M E *(**) 1 5(0)} + K[Ti 9 TllB . 

k=0 3=1 

Dividing both sides by K and using (|45j, we have: 



t=0 i=l 



^|E E {E*(^)i5(o)} + PI^ 

fc=0 3=1 

2 

Now notice that we always have Ik > AT. Hence: 

A' — 1 r -, tfc — Xr 



K 



t=o i=l 



^ £e{ j>(*) 1 5(0)} < 1e{ £ 5>(t) 1 5(0)} 

t=Q j=l 

cs 2 + 7T^ max [if-iYls 2 



< 



7/ 2 

This proves ( f20b for the case when 5(0) = s\. The case when 
5(0) = Sj 7^ si can be treated in a similar way as in Part A. 
It can be shown that the above backlog bound still holds, as 
the effect of the backlog values before the first hitting time 
Tjx(Q) will vanish as time increases. This proves the backlog 
bound ( 1201 . Theorem [2] thus follows by combining the two 
proofs. ■ 

Appendix A- Proof of TheoremQ] 

We now prove Theorem Q] The proof idea is shown in 
Fig. Q] and can be described as follows: First we construct 
a "convexified" version of the deterministic problem (0 and 
show that it gives the exact value of Vf* v . We then show 
that the dual function g c (*y) of this convexified problem, is 
exactly the same as the dual function g(-f) of ([9). Hence the 
two dual problems have the same optimal value. We finally 
show that the duality gap is zero for the convexified problem 
by showing that its "utility-constraint" set is convex. Hence 
9(l y) = 9* = Vfav wner e g* is the optimal value of the 
dual problem for the convexified problem. 




A_bar(x)-B_bar(x) 



Fig. 1. The left figure shows the utility-constraint set of the deterministic 
problem with r = 1, M = 1 and its dual function. The right figure shows 
the utility-constraint set of the corresponding "convexified" problem and its 
dual function. It can be seen that the two dual functions are the same, and 
that the "convexified" problem has no duality gap. 



Proof: (Theorem [T} For notation simplicity, we denote 
the set of x= {x {si \x^\ ...), x (s ' } G X {s ' ] as X. We then 
consider the following modified deterministic problem: 



min: F{{a^\ x k }) ^ £ ^ £ aJ><>/(*, 4*°) (46) 



r+2 



fe=l 
r+2 



s.t. ^({4 Sl V}) = £^£4 Sl) ^(s 



k=l 



r+2 



Si fe=l 

x k eX Vfc = l,...,r + 2, 

r+2 



Here x 



> 



(»*) 



o,£«£ 



1, V Sj 



Due to the use of the auxil- 



iary variables {al s }, this problem can be viewed as the 
"convexified" version of the original deterministic problem 
dH). Denote the optimal value of (06]l as OPT c . H We will 
prove Theorem[T]via the following two claims. The first claim 
shows that OPT c = Vf* v and the second claim shows that 
OPT c = g(Y v )- 

Claim 1: V f* v = OPT c 

Proof: (Claim [TJ: For each action vector x G X, we 
define its "utility-constraint" vector J{x) as follows: 

J{x) = {F{x) 1 Ai{x)-Bi{x) 1 ... 1 A r {x)-B r {x)). 

Denote J = {J{x) : x G X}, i.e., J is the set of all 
possible utility-constraint vectors for X, and denote J the 
convex hull of J. Let n* be an optimal action-choosing policy 
that solves the stochastic problem. Now define the "utility- 
constraint" vector J for n* as: 



where / n * = f* v is the time average cost under n*, and 
A^ 1 * j ^n* are the time average input and output rates to queue 

2 Without loss of generality, we assume such an optimal value exists. Else 
we can replace the "min" with "inf" in {46), consider an e-optimal solution 
and let e — > 0. Below we will use similar assumptions about the existence of 
an optimal policy for the stochastic problem, and the existence of an optimal 
solution to (46). 
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j under II*. Note that here we have assumed without loss of 
generality that the time averages converge. It can then be 
shown by using an argument similar to that in [3| that the 
vector J n G 3 ■ Using Caratheodory's theorem lfl2l . we see 



then there exist {a k } 



(Si) x k=l, 



■ r+2 



i z— 1,2. 



with a 

k\r+2 



(Si) 



1, and a set of action vectors {x C X such that: 



r+2 

E^ x 

St k=l 
r+2 



f{8 i ,x£ ,) ) = f*\ 



r + 2 



E 



^=1 



fe=i 



/xf < V j. 



The inequality A^* —[if" — ® holds since II* is by definition a 
stabilizing policy. This shows that {4 }*=i 2 



\{x k Ykt\ 

is a feasible solution of d46t , implying > OPT c . 

To prove the other direction, let {4 it: 2 "' r+2 and 
{^*}[-=i he an optimal solution pair of ( f46b - Now by 
our slackness assumption (Q]), there exists a set of actions 
{4 }ti2 '"' r+2 and probabilities {4 }ti2 " ,r+2 with 
EA-*\ = 1 such that E Sl ^{E^i Sl) [^(^,4 Sl) ) ~ 
fij(si, x^)]} < — ?7 for some 77 > for all j. We can thus 
construct the following policy II': fix some e G (0, 1), at every 



state Si, choose action a;; with probability (1 



e)a[. s *' and 



choose action x[ s ^ with probability a^ s " 



Since |</*(^)l — 
Smax for all t, it is easy to see then under II': 



\f n ' - OPT c /V\ < eS„ 



(47) 



and that for each queue j, A™ — /4 1 < — er) < 0. This policy 
can be shown to ensure that the network is strongly stable. 
Hence II' is a feasible control policy. Therefore f n ' > f* v by 
the definition of /*,„. Using this fact together with d47b . we 
have OPT c /V > f* v - e8 max . Since this holds for all e > 0, 
we have OPT c /V > f* v . ■ 
Claim 2: OPT c = .9(7^) 

Proof: We first look at the dual problem of (|46*T i: 



max : g c {~f), 
where g c (j) is defined: 



s.t. 7^0, 



9c (7) = inf 



E^ 



r+2 

^E4 S 
fe=i 



7(* 



E 



7j 



r r+2 

E< 



(«0 



r+2 

E< 

fc=i 



(48) 



(49) 



Now by comparing d49l and ( fTOb , we see that .g c (7) = 3(7) 
for all 7^0. This is so because at any 7 ^ 0, we first 
have g c (7) < 9(7). Now if {x^}^.-, are the minimizers of 
g(j), then {x^lfe^ 2 , with = 



= 1 and 



= if k ^ 1, will also be the minimizers of g c {l)- This 
shows gd'f ) = .9(7), which then implies that 9* = .9(7^), 
where 9* is the optimal value of (1481 . 

3 In the case when this assumption is violated, the same argument can be 
applied to the limit points of the time averages but is more involved. 



Now it remains to show that g* = OPT c . It suffices to show 
that g* > OPT c . We prove this claim by using a similar 
approach as that in Page 234 of |fT51 . Denote the set V = 

{KM! (Zr 2 : aj>° > 0,£ fc ^ l) = 1, Vs,}. Consider 
the set Ai as follows: 



M = \ (u, Cl ...,c r ) I 3 {4 Sl) } G T,{x k Y+:{ C X s.t 



J r ({a ( k Si) ,x k }) < w,and 



M{a^>,x k })-B 3 {{a k ^,x k })<c 3 , Vj 



It is not difficult to show that J C M. We now show that A4 
is convex. Indeed, if two vectors (u, ci..., c r ) and (u, ci..., c r ) 



are both in 7W, then there exist {a^}^;^]^ 2 , {x k } r k t 2 1 
r({a k *»,x k })<u, T({a^>,x k })<u, 



and {4 }*=12 "' r+2 , {& fc }fci 2 such that: 



A-({4 s,) ^ fc })-^({4 s,) ^ fe })<^, vj, 

Now if we consider the vectors f? • (u, Ci, ...,c r ) + (1 — 6 1 ) • 
(w, Ci..., c r ). Using Caratheodory's theorem again, we see that 
there exists {a^ }iZi' 2 ''' r+2 , {x k } 7 k t 2 1 such that: 



= 9 



^({4 8i U s })-B;({«rv fc }) 



+(!-<?) 



This implies that 9- (u, ci, c r ) + (1 — 9)- (u, ci..., c r ) G M, 
hence M. is convex. 

We now define a second convex set T> as T> = 
{(1/, ci, Cr) I 1^ < OPT c , Cj = 0, V j}. It is easy to see then 
M. n T> is empty, for otherwise OPT c can not be the optimal 
value of (l48l . Therefore there exists a hyperplane with norm 
(C,7ij •••)7r) 7^ an d some constant c such that: 



(u, ci, c r ) G 

(f,Ci, ...,Cr) € 2? 



<+E^' c j - c ' 
*;=i 

r 

K + E 7 j c j - c - 



(50) 



fe=l 



We can thus conclude that £ > 0,7j > and < c for 
all 1/ < OPT c , which implies (OPT c < c. Using these in 
(l50l l. and using the fact that J C M, we see that for any 

{4 Si) }S; 2 ;::: +2 e r,^*}^ c a-, we have: 



COPT c <c<CJ-({4 Sl) ,a; fc }) 



(51) 



Clearly, ( / 0, for otherwise we can plug in the actions 
{ x k Ii=i 2 "' r+2 an d probabilities {i?^ 5 "' r+2 in the 
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slackness assumption ([TJ to obtain: 

r 

o>X>(-*?) >0, 

which will imply (C,7i, --^Tr) = 0. Thus we see that £ > 0. 
Now dividing £ from both sides of (l5TT l. we have: 



7({a^,»*}) + E^ 



^({4^ !a; fe })-^({4 Si) ^ fe }) 



> opt;, 

where 7^ = 7-//C- This implies gdl') > OPT c with 7' = 
(7(,...,7^) T . Hence > ^(7') > OPT c , which by weak 
duality implies g* = OPT c , and so g{~i*v) = OPT c . ■ 
Combining Claim QJ and [2] we see that Theorem [TJ follows. ■ 
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